- 
                Notifications
    
You must be signed in to change notification settings  - Fork 374
 
Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imaihal very clean interface. High level question before I go into the details of the PR; how are the output of the ONNX.Parallel defined? Is the first output the output of the first fork, and the second output the output of the second fork?
Because in mlir, parallel already exists in thescf dialect as the "equivalent" of omp parallel for, maybe we should name this construct ONNX.ParallelForks? Just to avoid future conflicts if we were to introduce a parallel loop.
In term of OMP, the provided functionality here is the closest to the omp parallel sections with omp section inside. So that could  be an alternative naming scheme (namely ONNX.ParallelSections and ONNX.Section).
| 
           @AlexandreEichenberger Thanks for your comments! 
 I didn't consider it. The output order of  Regarding naming, current   | 
    
          
 The way the values of each "yield" gets reflected into the output of the "parallel" needs to be defined in the text associated with the operations. If the current approach is that the "first" fork/yield produces the "first" output value of the "parallel", and the "second" fork/yield generates the "second" output value of the parallel, that is perfectly fine. We just need to make sure the fork are not reordered (I am sure they are not), and clearly explain this ordering in the definition/example associated with the new operations.  | 
    
| 
           Thanks for proposing these operations. I have some high-level comments. 
 There is a subtle difference between  It looks to me adding   | 
    
| 
           The other structural ONNX ops, such as ONNX.IF and ONNX.Loop, can read value from outside, but all the generated values can be used outside only through yield. Do the new ops have the same behavior?  | 
    
| 
           To me, the parallel op is a scheduling decision, and should be added in a later phase. They should be not be in onnx dialect, if there is another choice. By the way, can we use omp.sections for this purpose?  | 
    
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
          
 I updated the description for the operations. Thanks.  | 
    
          
 Yes. The ONNXParallel and ONNXFork have the same behavior. 
 We assume these operations are inserted just before ONNXToKrnl pass after applying optimizations in ONNX IR.(Currently we are using a rewriting tool for testing. We generate ONNXIR using -EmitONNXIR, then insert ONNXParallelOp and ONNXForkOp using the tool). I think omp.section is memref-level operation. We wanted tensor-level operations to parallelize in tensor level. Currently in PR #2756, we used krnl.parallel in memre-level reusing existing openMP parallelization, but we should be able to lower these operations into omp.section.  | 
    
This PR introduces new operations for Operator-level parallelization. Overall plan for implementation of Operator-level parallelization is described in issue #2811 . This PR is the first PR.
ONNXParallelOpONNXForkOpsshould be included in the body region. The threads created by ONNXForkOp are synchronized with main thread at the end of this operation. The results to be used in following operations need to be set as the results of this operation by using ONNXYieldOp.ONNXForkOpbodyregion of the operation.